Robust Sound Modeling for Song Detection in Broadcast Audio
نویسندگان
چکیده
This paper describes the development of an audio fingerprint called AudioDNA designed to be robust against several distortions including those related to radio broadcasting. A complete system, covering also a fast and efficient method for comparing observed fingerprints against a huge database with reference fingerprints is described. The promising results achieved with the first prototype system observing music titles as well as commercials are presented. INTRODUCTION A monitoring system able to automatically generate play lists of registered songs can be a valuable tool for copyright enforcement organizations and for companies reporting statistics on the music broadcast. Technology able to listen to audio in order to identify songs without using external metadata or embedded watermarking is commonly known as Audio Fingerprinting. In a general Audio Fingerprinting scheme, the system generates a unique fingerprint of audio material based on an analysis of the acoustic properties of the audio itself. The fingerprint is compared against a database of fingerprints to identify a song. The difficulty inherent in the task of identifying broadcast audio material is mainly due to the difference of quality of the original titles, usually stored on Audio CD and the quality of the broadcast ones. The song is transmitted partially, the presenter talks on top of different fragments, the piece is maybe played faster and several manipulation effects are applied to increase the listener’s psychoacoustic impact (compressors, enhancers, equalization, bassbooster, etc). Moreover, in broadcast audio streams there are no markers indicating the begin and the end of the songs. Such a system also has to be fast because it must do comparisons with several thousand songs. This affects the memory and computation requisites since the system should observe several radio stations, give results on-line and should not be very expensive in terms of hardware. The paper describes an approach to the problem. After introducing the concept of audio fingerprinting, it proposes a modeling of audio aimed at being robust to different distortions in an adverse environment: radio broadcast. Along with an explanation of the CANO ET AL. ROBUST MODELING FOR SONG DETECTION AES 112 CONVENTION, MUNICH, GERMANY, 2002 MAY 10–13 2 fingerprint matching algorithms, we present some preliminary results and conclusions. AUDIO FINGERPRINTING Fingerprinting, or content-based identification (CBID), technologies work by extracting acoustic relevant characteristics of a piece of audio content and storing them in a database. When presented with an unidentified piece of audio content, characteristics of that piece are calculated and matched against those stored in the database. Using complex matching algorithms and acoustic fingerprints different versions of a single recording can be identified as the same music title [1], [2]. This is different to an alternative existing solution to monitor audio content: Audio Watermarking. In Audio Watermarking [3], research on psychoacoustics is conducted so that an arbitrary message, the watermark, can be embedded in a recording without altering the perception of the sound. In Audio Fingerprinting, the message is automatically derived from the perceptually most relevant components of sound. This makes it less vulnerable in theory to attacks and distortions since trying to modifying this message, the fingerprint, means alteration of the quality of the sound [4]. Independently of the specific approach to extract the content-based compact signature, a common architecture can be devised to describe the functionality of fingerprinting [1]. METADATA: Track Id Artist
منابع مشابه
Audio Identification Using Sinusoidal Modeling and Application to Jingle Detection
This article presents a new descriptor dedicated to Audio Identification (audioID), based on sinusoidal modeling. The core idea is an appropriate selection of the sinusoidal components of the signal to be detected. This new descriptor is robust against usual distortions found in audioID tasks. It has several advantages compared to classical subband-based descriptors including an increased robus...
متن کاملAcoustic quality assessment at Nezamol molk dome of Jame mosque of Isfahan
Incontrovertibly, the sense of hearing is one of the five most substantial human senses. In fact, the human ear receives sound and transmits to the human brain by the auditory organs. Hence, sound can be considered as one of the key tools of human communication with each other and the environment around them. Since acoustic has a profound impact on the body, soul, and the performance of human ...
متن کاملRobust Fault Detection on Boiler-turbine Unit Actuators Using Dynamic Neural Networks
Due to the important role of the boiler-turbine units in industries and electricity generation, it is important to diagnose different types of faults in different parts of boiler-turbine system. Different parts of a boiler-turbine system like the sensor or actuator or plant can be affected by various types of faults. In this paper, the effects of the occurrence of faults on the actuators are in...
متن کاملRobust Sound Event Detection in Continuous Audio Environments
Sound event detection in real world environments has attracted significant research interest recently because of it’s applications in popular fields such as machine hearing and automated surveillance, as well as in sound scene understanding. This paper considers continuous robust sound event detection, which means multiple overlapped sound events in different types of interfering noise. First, ...
متن کاملAdaMast: A Drum Sound Recognizer based on Adaptation and Matching of Spectrogram Templates
This paper describes a template-matching-based system, called AdaMast, that detects onset times of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. AdaMast uses the power spectrograms of the drum sounds as templates. However, there are two main problems in transcribing drum sounds in the presence of other sounds. The first problem is that actual drum-s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002